Duplicate Removal in Information Dissemination

نویسندگان

  • Tak W. Yan
  • Hector Garcia-Molina
چکیده

Our experience with the SIFT [YGM95] information dissemination system (in use by over 7,000 users daily) has identi ed an important and generic dissemination problem: duplicate information. In this paper we explain why duplicates arise, we quantify the problem, and we discuss why it impairs information dissemination. We then propose a Duplicate RemovalModule (DRM) for an information dissemination system. The removal of duplicates operates on a per user, per document basis { each document read by a user generates a request, or a duplicate restraint. In wide-area environments, the number of restraints handled is very large. We consider the implementation of a DRM, examining alternative algorithms and data structures that may be used. We present a performance evaluation of the alternatives and answer important design questions such as: Which implementation is the best? With \best" scheme, how expensive will duplicate removal be? How much memory is required? How fast can restraints be processed?

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient and Scalable Information Dissemination in Mobile Ad hoc Networks

Efficient and scalable information dissemination is important in network management, packet routing and auto configuration. Given the very nature of a mobile ad hoc network, this information dissemination provides a challenging problem. This paper introduces a new packet flooding mechanism for information dissemination in mobile ad hoc networks. Neighbor Aware Adapative Power (NAAP) flooding, a...

متن کامل

Duplicate Removal for Candidate Answer Sentences

In this paper, we describe the duplicate removal component of Infolab’s1 question answering system that contributed to CSAIL’s entry of TREC-152 Question Answering track. The goal of the Question Answering Track is to provide short, succinct answers to English sentences posed by users. In answering definition questions, we are asked to retrieve new and relevant information, in the form of short...

متن کامل

Application of Information and Communication Technology for Dissemination of Agricultural Information among Farmers: Challenges and Opportunities

Agriculture is the backbone of India’s economy as two-third of the population live in rural areas and directly or indirectly depend on agriculture for their livelihood. India’s food production has improved significantly during the last three decades due to all-round efforts but Indian agriculture is still facing a multitude of problems to maximize productivity to feed the continuously increasin...

متن کامل

Comparative Analysis of Information Dissemination Capabilities of Media and Social Networks

Background and Aim: Human Knowledge depends on data and information that is emerged and transffered from different channels. The dessimination process is different from type, form of transfer, and distribution based on information or awareness. This survey compares the librarians and information scienctist’s information transferring capabilities in mass media and social networks. Methods: This ...

متن کامل

Library and dissemination of health information

Purpose: The aim of this research is to determine the role of public and academic libraries in disseminating of health information and comparing of the two types of libraries with each other. Method: The research method is applied with descriptive-survey. The sample size was determined by using the Morgan table, 379 people. The questionnaires were distributed using Quotas sampling method. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995